Access published May 17 , 2007
نویسنده
چکیده
2 any given PCR reaction may contain PCR chimeras (Cronn et al., 2002) and biases in the success rate for cloning the two haplotypes (Morrell et al., 2006). Also, AS-PCR can fail to prime specifically resulting in genotyping errors. Cloning and AS-PCR are also vulnerable to human and software errors; errors may include sequence assembly problems, sample mislabel-ing, misinterpretation of sequence traces or sample contamination. While careful sequencing protocols are likely to minimize errors, a tool to identify a subset of unlikely base calls that require re-inspection will be extremely valuable when high data quality is essential. In the present paper we present a method for error detection in which haplotypes are examined three SNPs at a time. We refer to the method as EDUT (Error Detection Using Triplets) and we present and test a Perl script implementing the method. EDUT is most sensitive to " switch errors " , SNPs occurring on an incorrect haplotype background. Detecting switch errors is difficult because the base calls that lead to incorrect inference of haplo-types frequently occur at known SNPs. In the absence of a method evaluating the likelihood of SNP arrangements, an error can appear to be a likely base call. Switch errors are often re-peatable, and can result from machine scored data. There is currently no routine procedure for detecting this type of error. In a previous paper, the EDUT method was used to identify switch errors in haplotype data from a resequencing project (Morrell et al., 2006). Researchers have avoided the problems associated with direct sequencing of nuclear genes in diploid organisms by focusing on mitochondrial and chloroplast loci not subject to sequencing difficulties caused by heterozygosity, or on organisms where inbreeding or genetic manipulation result in homozygosity at nuclear genes (Wright and Gaut, 2005). As population studies are expanded to a wider range of diploid organisms (including humans), a method with the broad applicability demonstrated by EDUT will be extremely useful. Published methods for detecting errors in sequencing and phasing, particularly for data from humans, have made use of the relatedness of haplotypes sampled in parents, progeny and Because haplotypes are passed from parents to progeny with a very small probability of being altered by mutation or rearranged by recombination, nucleotide sequence information from related individuals with a known pedigree is useful for imputation of missing data and haplotype reconstruction (Li and Jiang, 2005), and for error detection …